Multimodal signal processing in naturalistic noisy environments

نویسنده

  • Sharon L. Oviatt
چکیده

When a system must process spoken language in natural environments that involve different types and levels of noise, the problem of supporting robust recognition is a very difficult one. In the present studies, over 2,600 multimodal utterances were collected during both mobile and stationary use of a multimodal pen/voice system. The results confirmed that multimodal signal processing supports significantly improved robustness over spoken language processing alone, with the largest improvement during mobile use. The multimodal architecture decreased the spoken language error rate by 19-35%. In addition, data collected on a command-by-command basis while users were mobile emphasized the adverse impact of users’ Lombard adaptation on system processing, even when a noise-canceling microphone was used. Implications of these findings are discussed for improving the reliability and stability of spoken language processing in mobile environments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving speech recognition performance in noisy environments

Speech recognition errors affect the performance of multimodal systems especially in noisy conditions. For that reason the use of speech recognition confidence score and time stamps is suggested. This paper deals with the impact of a signal enhancement method on these speech recognition’s parameters, in the presence of noise. By recognising noisy signals not only the word error rate (WER) incre...

متن کامل

Multimodal exemplar-based voice conversion using lip features in noisy environments

This paper presents a multimodal voice conversion (VC) method for noisy environments. In our previous exemplarbased VC method, source exemplars and target exemplars are extracted from parallel training data, in which the same texts are uttered by the source and target speakers. The input source signal is then decomposed into source exemplars, noise exemplars obtained from the input signal, and ...

متن کامل

Towards Next-Generation Lip-Reading Driven Hearing-Aids: A preliminary Prototype Demo

Speech enhancement aims to enhance the perceived speech quality and intelligibility in the presence of noise. Classical speech enhancement methods are mainly based on audio only processing which often perform poorly in adverse conditions, where overwhelming noise is present. This paper presents an interactive prototype demo, as part of a disruptive cognitivelyinspired multimodal hearing-aid bei...

متن کامل

Multimodal voice conversion based on non-negative matrix factorization

A multimodal voice conversion (VC) method for noisy environments is proposed. In our previous non-negative matrix factorization (NMF)-based VC method, source and target exemplars are extracted from parallel training data, in which the same texts are uttered by the source and target speakers. The input source signal is then decomposed into source exemplars, noise exemplars, and their weights. Th...

متن کامل

Semantic Fusion for Biometric User Authentication as Multimodal Signal Processing

Today the application of multimodal biometric systems is a common way to overcome the problems, which come with unimodal systems, such as noisy data, attacks, overlapping of similarities, and nonuniversality of biometric characteristics. In order to fuse multiple identification sources simultaneously, fusion strategies can be applied on different levels. This paper presents a theoretical concep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000